Guide for setting up GCP
This guide helps you set up your Google Cloud Platform (GCP) project to run with DataStori.
DataStori uses Google Cloud Run Jobs to run the pipeline code. DataStori integrates with your GCP project using IAM Service Accounts.
Requirements
Please be ready with the following resources:
VPC/Networking requirements
- VPC Network Name: The VPC where you want to run your code.
- Subnet Name: The subnet where you want to run your code.
- VPC Firewall Rules: Ensure firewall rules allow necessary egress. You can control access via network tags.
Service Requirements
- Cloud Run: DataStori will spin up Cloud Run Jobs to run the pipeline and store the output in Google Cloud Storage. Please ensure the Cloud Run API is enabled in your project.
- Google Cloud Storage (GCS): Data is stored here. Please be ready with the Bucket Name where you want the data to be stored.
- RDBMS (optional): If you want to push data to any other RDBMS.
IAM / Service Accounts
We will create two service accounts:
- One for the Cloud Run Job to use (workload identity).
- One for DataStori to impersonate to manage the jobs.
- Create a Service Account for the Job: This service account will be used by the Cloud Run Job itself to access GCS.
- Navigate to IAM & Admin -> Service Accounts -> Create Service Account.
- Name it
datastori-job-runner-sa. - Grant this service account the Storage Object Admin (
roles/storage.objectAdmin) role so it can write to your GCS bucket. - Click Done. Note down the email of this service account.
Create a Service Account for DataStori: This is the service account that DataStori's infrastructure will authenticate as.
- Create another service account named
datastori-integration-sa. - Do not grant this service account any roles directly in this step. Click Done.
- Note down the email of this service account.
- Create another service account named
Create a Custom Role: This role will contain the specific permissions DataStori needs to manage Cloud Run Jobs and use the job runner service account.
- Navigate to IAM & Admin -> Roles -> Create Role.
- Give it a title like "DataStori Job Manager".
- Add the following permissions:
run.jobs.runrun.jobs.getrun.jobs.listrun.executions.getrun.executions.listrun.executions.deleteiam.serviceAccounts.actAs(Allows passing the runner SA to the job)
- Click Create.
Bind Permissions:
- On the Project Level: Go to IAM & Admin -> IAM. Grant the
datastori-integration-sathe custom "DataStori Job Manager" role you just created. - On the Job Runner Service Account: Navigate to the
datastori-job-runner-sayou created in step 1. Go to the Permissions tab. Grant thedatastori-integration-sathe Service Account User (roles/iam.serviceAccountUser) role. This allows the integration account to "act as" the runner account. - Allow DataStori to Impersonate: Navigate to the
datastori-integration-sa. Go to the Permissions tab. Grant the DataStori main service account principal the Service Account Token Creator (roles/iam.serviceAccountTokenCreator) role. Please ask customer support for DataStori's principal service account email.
- On the Project Level: Go to IAM & Admin -> IAM. Grant the
Logging (Optional)
By default, DataStori will write the pipeline logs to Google Cloud Logging. If you want to customize the logging destination, please share details of your log sink configuration.
Summary
To proceed with the GCP setup, please provide the following:
- GCP Project ID
- VPC Network Name
- Subnet Name
- GCS Bucket Name
- GCS Bucket Region
- The email of the
datastori-integration-sa(Service Account for DataStori) - The email of the
datastori-job-runner-sa(Service Account for the job) - Cloud Logging sink details (optional)